#5 Basic NLP with Command-line
Faculty of Humanities and Social Sciences
University of Lucerne
23 March 2024
Historical development of Swiss party politics (Tagesanzeiger)
.txt)
.csv, .tsv, .xml)
. . .
= n_occurrences= n_occurrences / n_total_words.tsv filePrint the following sentence in your command line using echo.
How many words are in this sentence? Use the pipe operator | to pass the output above to the command wc.
Match the words computational and colorize its occurences in the sentence using egrep.
Get the frequencies of each word in this sentence using tr and other commands.
Save the frequencies into a tsv-file, open it in a spreadsheet programm (e.g., Excel, Numbers) and compute the relative frequency per word.
🤓 Publishing code and data are key to open science.
We are having the next to classes via Zoom
git pull. When you haven’t cloned the repository, follow section 5 of the installation guide .KED2024/materials/data/swiss_party_programmes/txt. Change into that directory using cd.more.Compare the absolute frequencies of single terms or multi-word phrases of your choice (e.g., Ökologie, Sicherheit, Schweiz)…
Use the file names as filter to get various aggregation of the word counts.
Pick terms of your interest and look at their contextual use by extracting relevant passages. Does the usage differ across parties or time?
💡 Share your insights with the class using Etherpad.
tsv dataset. Compute the relative word frequency instead of the absolute frequency using any spreadsheet software (e.g. Excel). Are your conclusions still valid after accounting for the size?Pro Tip 🤓: Use egrep to look up commands in the .md course slides
When you look for useful primers on Bash, consider the following resources: